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CURVED  INERTIA  FRAMES: 

VISUAL  ATTENTION  AND  PERCEPTUAL  ORGANIZATION 
USING  CONVEXITY  AND  SYMMETRY 


J.  Brian  Subirana-Vilanova 


Abstract: 

In  this  paper  we  present  an  approach  to  perceptual  organization  and  attention  based 
on  Curved  Inertia  Frames  (C.LF.),  a  novel  definition  of  ‘‘curved  axis  of  inertia”.  Such 
a  definition  is  novel  because  it  is  global  and  can  detect  curved  axes;  it  can  also  be 
used  to  compute  a  frame  of  reference  of  the  shapes  in  an  image  useful  for  non-rigid 
object  recognition  or  to  pull  out  interesting  structures  in  the  image.  The  scheme  as¬ 
signs  a  saliency  measure  to  each  component  of  the  reference  frame  that  is  a  measure 
of  its  relevance,  so  that  large,  smooth,  convex,  symmetric  and  central  parts  play  a 
more  central  role  in  the  description  of  the  shape.  One  of  the  remarkable  features  of 
the  scheme  is  its  tolerance  to  noisy  and  spurious  data. 

Several  perceptual  phenomena  observed  in  humans  such  as  grouping  based  on  sym¬ 
metry  or  convexity  and  environmental  bias  in  shape  description  can  be  supported 
naturally  in  this  scheme.  The  scheme  also  supports  other  operations  such  as  finding 
the  most  ‘interesting  point”  or  “feature”  in  the  image  (for  subsequent  processing)  or 
defining  what  is  inside  and  what  is  outside  an  object.  An  extension  of  the  scheme  to 
find  long  and  smooth  ridges  on  an  arbitrary  surface  is  presented.  The  extension  is 
illustrated  in  the  problem  of  finding  salient  blobs  in  images  and  it  is  suggested  that 
similar  schemes  be  used  in  other  early  and  middle  level  vision  tasks. 
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1  Introduction 


The  Problem:  Finding  Reference  Frames 

A  shape  description  is  an  encoding  of  a  shape.  A  common  approach  is  to  describe 
the  points  of  the  shape  in  a  cartesian  coordinate  reference  frame  fixed  in  the  image 
(see  Figure  1).  An  alternative  is  to  center  the  frame  on  the  shape  so  that  a  canonical 
description  can  be  achieved.  For  some  shapes  this  can  be  obtained  by  orienting 
the  frame  of  reference  along  the  inertia  axis  of  the  shape  (see  Figure  1).  If  the 
objects  are  elongated  and  flexible,  we  suggest  another  alternative  that  might  be  more 
appropriate,  the  use  of  a  curved  frame  of  reference  (see  Figure  2).  Recognition  can  be 
done  using  a  canonical  description  of  the  shape  obtained  by  rotating  or  “unbending” 
the  shape  using  the  frame  as  an  anchor  structure  (see  Figure  2).  For  complex  shapes, 
a  part  decomposition  for  recognition  can  be  obtained  with  a  skeleton-like  frame  (e.g. 
[Connell  and  Brady  87],  see  Figure  3).  In  this  paper,  we  address  the  problem  of 
finding  reference  frames  (a.k.a.  skeletons,  symmetry  or  distance  transforms,  voronoi 
diagrams  etc.),  for  a  variety  of  tasks  such  as  recognition,  attention,  figure-ground  and 
perceptual  organization.  Our  approach  is  based  on  Curved  Inertia  Frames,  a  novel 
definition  of  “curved  axis  of  inertia”. 


Other  Applications  Of  Reference  Frames:  Perceptual  Organization,  At¬ 
tention,  Feature  and  Corner  Detection,  Part  Segmentation,  and  Shape 
Description 

The  use  of  reference  frames  need  not  be  restricted  to  recognition.  Non-recognition 
examples  include:  finding  an  exit  path  in  the  maze  of  Figure  4,  finding  the  comer  in 
Figure  5,  finding  the  blob  in  Figure  6,  determining  figure-ground  relations  in  Figure 
7  and  finding  the  most  interesting  object  in  Figure  17.  These  examples  are  closely 
related  to  figure-ground  relations  and  perceptual  organization  (a.k.a.  grouping,  selec¬ 
tion  and  segmentation),  a  process  that  computes  regions  of  the  image  coming  from  one 
single  object  (of  interest  if  possible),  with  little  detailed  knowledge  of  the  particular 
objects  present  in  the  image.  The  main  advantage  of  our  scheme  over  previously  pre¬ 
sented  perceptual  organization  schemes  [Marroquin  1976],  [Witkin  and  Tenenbaum 
1983],  [Mahoney  1985],  [Harlick  and  Shapiro  1985],  [Lowe  1984,  1987],  [Sha’ashua 
and  UUman  1988],  [Jacobs  1989],  [Crimson  1990],  [Subirana-Vilanova  1990]  is  that 
it  can  find  complete  curved,  symmetric  and  large  structures  directly  on  the  edges  of 
the  image  without  requiring  features  like  straight  segments  or  corners.  In  this  con¬ 
text,  perceptual  organization  is  related  to  part  segmentation  [HoUerbach  1975],  [Mart 
1977],  [Duda  and  Hart  1973],  [Binford  1981],  [Hoffman  and  Richards  1984],  [Vaina 
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and  Zlateva  1990],  [Badler  and  Bajcsy  1978],  [Binford  1971],  [Brooks,  Russel  and 
Binford  1979],  [Brooks  1981],  [Biederman  1985],  [Marr  and  Nishihara  1978],  [Marr 
1982],  [Guzman  1969],  [Pentland  1988]  and  [Waltz  1975]  since  we  are  interested  in 
finding  an  arrangement  of  structures  in  the  image,  not  just  on  finding  some  of  them. 


Some  Reasons  Why  Finding  Reference  Frames  Is  Not  Trivial 

Finding  reference  frames  is  a  straightforward  problem  for  simple  geometric  shapes 
such  as  a  square  or  a  rectangle.  The  problem  becomes  difficult  for  shapes  that  do 
not  have  a  clear  symmetry  axis  such  as  a  notched  rectangle  (for  some  more  examples 
see  Figures  2,  8,  9,  and  15)  and  none  of  the  schemes  presented  previously  can  handle 
them  successfully.  Ultimately,  we  would  like  to  achieve  human-like  performance. 
This  is  diffictilt  partly  because  what  humans  consider  to  be  a  good  skeleton  can  be 
influenced  by  high-level  knowledge  (see  Figure  8). 


Previous  Work 


The  study  of  reference  frames  has  received  considerable  attention  in  the  computer 
vision  literature.  Reference  frames  have  been  used  for  different  purposes  (as  discussed 
above)  and  given  different  names  ^e.g.  skeletons,  voronoi  diagrams,  symmetry  trans¬ 
forms).  Previous  schemes  for  computing  skeletons  fall  usually  into  one  of  two  classes. 
The  first  class  looks  for  straight  axis,  such  as  the  axis  of  inertia.  These  methods  are 
global  (the  axis  is  determined  by  all  the  contour  points)  and  produce  a  single  straight 
axes.  The  second  class  can  find  a  curved  axis  along  the  figure,  but  the  computation 
is  based  on  local  information.  That  is,  the  axis  at  a  given  location  is  determined  by 
small  pieces  of  contours  surrounding  this  location.  Examples  of  such  schemes  are, 
to  name  but  a  few.  Morphological  Filters  (see  [Serra  82]  for  an  overview).  Distance 
Transforms  [Rosenfeld  and  Pfaltz  68],  [Borgefors  86],  [Arcelli,  Cordelia  and  Levialdi 
81],  Symmetric  Axis  Transform  [Blum  67],  [Blum  and  Nagel  78]  and  Smoothed  Local 
Symmetries  [Brady  and  Asada  84],  [Connell  and  Brady  87].  Recently,  computations 
based  on  physical  models  have  been  proposed  by  [Brady  and  Scott  88]  and  [Scott, 
Turner  and  Zisserman  89].  In  contrast,  the  novel  scheme  presented  in  this  paper, 
which  we  call  Curved  Inertia  Frames  (C.I.F.),  can  extract  curved  symmetry  axes, 
and  yet  use  global  information. 
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Outline 


The  approach  that  we  present  for  finding  skeletons  is  divided  into  two  successive 
stages.  In  Section  3,  we  present  the  first  stage,  in  which  we  obtain  two  local  measures 
at  every  point,  the  inertia  value  and  the  tolerated  length,  which  will  provide  a  local 
symmetry  measure  at  every  point,  and  for  every  orientation.  This  measure  is  high  if 
locally  the  point  in  question  appears  to  be  a  part  of  a  symmetry  axis.  This  simply 
means  that,  at  the  given  orientation,  the  point  is  equally  distant  from  two  image 
contours.  The  sjrmmetry  measure  therefore  produces  a  map  of  potential  fragments 
of  symmetry  curves  which  we  call  the  inertia  surfaces.  In  Sections  4  and  5,  we 
present  the  second  stage  in  which  we  find  long  and  smooth  axes  going  through  points 
of  high  inertia  values  and  tolerated  length.  In  section  6  we  introduce  the  skeleton 
sketch  and  show  some  results  and  applications  of  the  scheme,  and  in  section  7  we 
discuss  the  relation  of  our  scheme  to  human  perception.  We  conclude  in  section  8  by 
presenting  an  extension  of  the  scheme  to  find  high,  long,  and  smooth  curves  on  an 
arbitrary  surface.  The  extension  is  illustrated  on  the  problem  of  finding  salient  blobs 
in  images.  In  section  8  we  also  present  some  limitations  of  our  scheme  and  a  number 
of  topics  for  future  research. 

In  Appendix  I  we  prove  a  theorem  that  shows  some  strong  limitations  on  the 
class  of  measures  computable  by  the  computation  described  in  sections  4  and  5. 
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Figure  1:  Left:  a  shape  described  in  a  image  or  viewer  centered  reference  frame. 
Center:  the  same  shape  with  an  object  centered  reference  &ame  superimposed  on 
it.  Right:  a  canonical  description  of  the  shape. 

2  Five  Problems  With  Previous  Approaches 


Previously  presented  computations  for  finding  a  curved  axis  generally  suffer  from 
one  or  more  of  the  following  problems:  first,  they  produce  disconnected  skeletons  for 
shapes  that  deviate  from  perfect  symmetry  or  that  have  fragmented  boundaries  (see 


Figure  2:  Wliich  two  of  the  three  shapes  on  the  left  are  more  similar?  One  way 
of  answering  this  question  is  by  “unbending”  the  shapes  using  their  skeleton  as  a 
reference  frame,  which  results  in  the  three  shapes  on  the  right.  Once  the  shapes 
have  been  unbent,  it  can  be  concluded  using  simple  matching  procedures  that 
two  of  them  have  similar  “shapes”  and  that  two  others  have  similar  length.  We 
suggest  that  the  recognition  of  elongated  flexible  objects  can  be  performed  in  some 
cases  by  transforming  the  shape  to  a  canonical  form  and  that  this  transformation 
can  be  achieved  by  unbending  the  shape  using  its  skeleton  as  an  anchor  structure. 
The  unbending  presented  in  this  flgure  was  obtained  using  an  implemented  lisp 
program. 


Figure  5);  second,  the  obtained  skeleton  can  change  drastically  for  a  small  change  in 
the  shape  (e.g.  a  notched  rectangle  vs  a  rectangle)  making  these  schemes  unstable; 
third,  they  do  not  assign  any  measure  to  the  different  components  of  the  skeleton  that 
indicates  the  “relative”  relevance  of  the  different  components  of  the  shape;  fourth, 
many  computations  depend  on  scale,  introducing  the  problem  of  determining  the 
correct  scale;  and  fifth,  it  is  unclear  what  to  do  with  curved  or  somewhat-circular 
shapes  because  they  do  not  have  a  clear  symmetry  axis. 

Consider  for  example,  the  Symmetric  Axis  Transform  [Blum  67].  The  SAT  of  a 
shape  is  the  set  of  points  such  that  there  is  a  drcle  centered  at  the  point  that  is 
tangent  to  the  contour  of  the  shape  at  two  points  and  that  it  does  not  contain  any 
portion  of  the  boundary  of  the  shape,  see  [Blum  67]  for  details.  An  elegant  way  of 
computing  the  SAT  is  by  using  the  brushflre  algorithm  which  can  be  thought  of  as 
follows:  A  fire  is  lit  at  the  contour  of  the  shape  and  propagated  towards  the  inside 
of  the  shape.  The  SAT  will  be  the  set  of  points  where  two  fronts  of  fire  meet.  The 
Smoothed  Local  Symmetries  [Brady  and  Asada  84]  are  defined  in  a  similar  way  but, 
instead  of  taking  the  center  point  of  the  circle,  the  point  that  lies  at  the  center  of  the 
segment  between  the  two  tangent  points  is  the  one  that  belongs  to  the  SLS  and  the 
circle  needs  not  be  inside  the  shape.  In  order  to  compute  the  SAT  or  SLS  of  a  shape 
we  need  to  know  the  tangent  along  the  contours  of  the  shape.  Since  the  tangent 
is  a  scale  dependent  measure  so  is  the  SLS.  One  of  the  most  common  problems 
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Figure  3:  Drawings  of  a  woman,  a  horse  and  a  rider,  a  man  and  a  crocodile  made 
by  Tallensi  tribes  (adapted  from  [Deregowski  1989]).  See  also  [Marr  and  Nishihara 
1978). 
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Figure  4:  Left;  Maze.  Center:  A  portion  of  the  maze  is  highlighted  with  a 
discontinuous  doted  line.  Right:  A  candidate  description  of  the  highlighted  region 
that  may  be  useful  to  find  an  exit  to  the  maze.  In  this  paper,  we  are  interested 
in  finding  frames  or  skeletons  that  can  yidd  high-level  descriptors  like  the  one 
shown. 


(the  first  problem  above)  in  skeleton  finding  computations  is  the  failure  to  tolerate 
noisy  or  circular  shapes  which  often  results  in  disconnected  and  distorted  frames.  A 
notched  rectangle  is  generally  used  to  illustrate  this  point,  see  [Serra  1982],  [Brady 
and  Connell  1987]  or  [Bagley  1985]  for  some  more  examples.  [Heide  1984],  [Bagley 
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Figure  6:  Finding  the  bent  blob  in  the  left  image  would  be  easy  if  we  had  the  bent 
frame  shown  in  the  center.  Right:  Another  blob  defined  by  orientation  elements  of 
a  single  orientation.  The  scheme  presented  in  this  paper  needs  some  modifications 
before  it  can  attempt  to  segment  the  blob  on  the  right  (see  text). 

1985],  [Brady  and  ConneU  1987],  [Fleck  1985,  1986,  1988],  [Fleck  1989]  suggest  to 
solve  this  stability  problem  by  working  on  the  obtained  SLS  eliminating  the  portions 
of  it  that  are  due  to  noise,  connecting  segments  that  come  from  adjacent  parts  of  the 
shape  and  by  smoothing  the  contours  at  different  scales.  In  our  scheme,  symmetry 
gaps  are  closed  automatically  we  look  for  the  largest  scale  available  in  the  image  and 
the  frame  depends  on  all  the  contour,  not  just  a  small  portion  making  the  scheme 
robust  to  small  changes  in  the  shape. 

SAT  and  SLS  are  bad  for  circular  shapes.  [Fleck  86]  addressed  this  problem  by 
designing  a  separate  computation  to  handle  circular  shapes,  the  Local  Rotational 
Symmetries.  Our  scheme  has  a  preference  for  the  vertical  that  will  bias  the  frame 
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Figure  7:  This  Figure  illustrates  the  importance  of  symmetry  and  convexity  in 
grouping.  The  curves  in  the  left  image  are  grouped  together  based  on  symmetry. 
On  the  right  image,  convexity  overrides  symmetry,  after  [Kanizsa  and  Gerbino 
76].  This  grouping  can  be  performed  with  the  network  presented  in  this  paper  by 
looking  for  the  salient  axes  in  the  image. 


Gococooacococod) 

Figure  8:  All  the  shapes  in  this  Figure  have  been  drawn  by  adding  a  small  seg¬ 
ment  to  the  shape  in  the  middle.  At  a  first  glance,  all  of  these  shapes  would 
be  interpreted  as  two  blobs.  But  if  we  are  told  that  they  are  letters  then  finer 
distinctions  are  made  between  them.  When  we  use  such  high  level  knowledge 
we  perceive  this  shapes  as  being  different  and  therefore  their  associated  skeletons 
would  differ  dramatically. 


towards  a  vertical  line  in  circular  shapes.  When  the  shape  is  composed  of  a  long 
straight  body  attached  to  a  circular  one  (e.g.  a  spoon)  then  the  bias  will  be  towards 
having  only  one  long  axis  in  the  direction  of  the  long  body. 


S  Inertia  Surfaces  and  Tolerated  Length 


If  we  are  willing  to  restrict  the  frame  to  a  single  straight  line  then  the  axis  of 
least  inertia  is  a  good  reference  frame  because  it  provides  a  connected  skeleton  and 
it  can  handle  nonsymmetric  connected  shapes.  The  inertia  In(5X,  A)  of  a  shape  A 
with  respect  to  a  straight  line  SL  is  defined  as  (See  Figure  10): 

ln{SL,A)  =  j^V{a,SLyda  (1) 

The  integral  is  extended  over  all  the  area  of  the  shape,  and  2>(a,  SL)  denotes  the 
distance  from  a  point  a  of  the  shape  to  the  line  SL.  The  axis  of  least  inertia  of  a 


7 


shape  A  is  defined  as  the  straight  line  SL  minimizing  In(5Z,  A). 

A  naive  way  of  extending  the  definition  of  axis  of  least  inertia  to  handle  bent 
curves  would  be  to  use  Equation  1,  so  that  the  skeleton  be  defined  as  the  curve  C 
minimizing  In(C,A).  This  definition  is  not  useful  if  C  can  be  any  arbitrary  curve 
because  a  highly  bent  curve  that  goes  through  all  points  inside  the  shape  would  have 
zero  inertia  (see  Figure  11).  There  are  two  possible  ways  to  avoid  this  problem: 
either  we  define  a  new  measure  that  pensdizes  such  curves  or  we  restrict  the  set  of 
permissible  curves.  We  chose  the  former  approach  and  we  call  the  new  measure 
defined  in  this  paper  (see  equation  4)  the  inertia,  the  skeleton  saliency  or  saliency 
of  the  curve.  The  skeleton  saliency  of  a  curve  will  depend  on  two  local  measures: 
the  inertia  value  Z  that  will  play  a  role  similar  to  that  of  2>(p,  a)  in  equation  1  and 
the  tolerated  length  T  that  will  prevent  non-smooth  curves  from  receiving  optimal 
S2iliency  values.  The  saliency  of  a  curve  will  be  defined  for  any  curve  C  of  length  L 
starting  at  a  given  point  p  in  the  image.  We  define  the  problem  as  a  maximization 
problem  so  that  the  “best”  skeleton  will  be  the  curve  that  has  the  highest  saliency 
value.  By  best  we  mean  that  the  skeleton  corresponds  to  the  “most  central  curve” 
in  the  “most  interesting  (i.e.  symmetric,  convex,  large)”  portion  of  the  image. 


The  inertia  value 

The  inertia  measure  Z  for  a  point  p  and  an  orientation  a  is  defined  as  (see  Fig¬ 
ure  12): 


I{p,a)  =  2R^, 


Figure  12  shows  how  r,  R  and  the  inertia  surfaces  are  defined  for  a  given  orientation 
a,  R  =  d{pi,pr)l2  and  r  =  d{p,pc),  where  p(  imd  pr  are  the  closest  points  of  the 
contour  that  intersect  with  a  straight  line  perpendicular  to  a  (i.e.  with  orientation 
a  -f  ir/2)  that  goes  through  p  at  opposite  directions  and  pe  is  the  midpoint  of  the 
interval  between  these  two  points.  For  a  given  orientation,  the  inertia  values  of  the 
points  in  the  image  form  a  surface  that  we  call  the  inertia  surface  for  that  orientation. 
Figure  11  illustrates  why  the  inertia  values  should  depend  on  the  orientation  of  the 
skeleton  and  Figure  13  shows  the  inertia  surfaces  for  a  square  at  four  orientations. 

Local  maxima  on  the  inertia  values  for  one  orientation  indicate  that  the  point  is 
centered  in  the  shape  at  that  orientation.  The  absolute  value  of  the  local  maximum 
indicates  how  large  the  section  of  the  body  is  at  that  point  for  the  given  orientation, 
so  that  points  in  large  sections  of  the  body  receive  higher  inertia  values.  The  constant 
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3  or  symmetry  constant,  2  in  the  actual  implementation,  controls  the  decrease  in  the 
inertia  values  for  points  away  from  the  center  of  the  corresponding  section,  the  larger 
s  is  the  larger  the  decrease.  If  s  is  very  large  only  center  points  obtain  high  values 
and  if  s  =  0  all  points  of  a  section  receive  the  same  value. 


Figure  9:  Skeletons  found  for  a  rectangle  and  a  notched  rectangle  by  SAT  left 
and  SLS  right.  Observe  that  the  skeleton  is  highly  distorted  by  the  presence  of 
the  notch.  The  skeleton  found  by  the  computation  presented  in  this  paper  on  this 
shapes  can  be  seen  in  Figures  7  and  8. 


The  tolerated  length 

Figure  11  provides  evidence  that  the  curvature  on  a  skeleton  should  depend  on 
the  width  of  the  shape.  As  mentioned  above,  the  tolerated  length'*  T  will  be  used  to 
evaluate  the  smoothness  of  a  frame  so  that  the  curvature  that  is  tolerated  depends  on 
the  width  of  the  section  allowing  high  curvature  only  on  thin  sections  of  the  shape. 
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Figure  11:  Left:  A  rectangle  and  a  curve  that  would  receive  very  low  inertia 
according  to  Equation  1.  Center:  Evidence  that  the  inertia  value  of  a  point 
should  depend  on  orientation.  Right:  Evidence  that  the  tolerated  curvature  on  a 
skeleton  should  depend  on  the  width  of  the  shape. 


Figure  12:  This  Figure  shows  how  the  inertia  surfaces  are  defined  for  a  given 
orientation  a.  The  value  for  the  surface  at  a  point  p  is  X{R,r).  The  function  X  or 
inertia  function  is  defined  in  the  text.  R  =  d(p/,p,)/2  and  r  =  d{p,Pc),  where 
Pi  and  pr  are  the  points  of  the  contour  that  intersect  with  a  straight  line  perpen¬ 
dicular  to  a  that  goes  through  p  at  opposite  directions  and  Pe  is  the  midpoint  of 
the  interval  between  these  two  points.  If  there  is  more  than  one  intersection  along 
one  direction  then  we  use  the  nearest  one.  If  there  is  no  intersection  at  all  then 
we  give  a  preassigned  value  to  the  surface,  0  in  the  current  implementation. 


The  saliency  of  a  curve  vdll  be  the  sum  of  the  inertia  values  “up  to”  the  tolerated 
length  so  that  for  a  high  tolerated  length,  i.e.  low  curvature,  the  sum  will  include 
more  terms  and  will  be  higher.  The  objective  is  that  a  curve  that  bends  into  itself 
within  a  section  of  the  shape  have  a  point  within  the  curve  with  0  tolerated  length  so 
that  the  saliency  of  the  curve  will  not  depend  on  the  shape  of  the  curve  beyond  that 
point.  In  other  words,  T  should  be  0  when  the  radius  of  curvature  of  the  “potential” 
skeleton  is  smaller  than  the  width  of  the  shape  at  that  point  and  a  positive  value 
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otkerMrise  (with  an  increasing  magnitude  the  smoother  the  curve  is). 
We  define  the  tolerated  length  T  for  a  curvature  of  radius  as: 


T{-p^OL^T^ 


0  if  Te  <  R  +  r 

Tc(ir  —  arccoa{ otherwise 


If  a  curve  has  a  point  with  a  radius  of  curvature  smaller  than  the  width  of  the 
shape  its  tolerated  length  will  be  0  and  this,  as  we  vrill  see,  results  in  a  non-optimal 
curve* 

In  this  section  we  have  introduced  the  inertia  surfaces  and  the  tolerated  length. 
We  will  define  a  salient  frame  of  reference  to  be  a  high  and  long  curve  in  the  inertia 
surfaces  that  is  as  smooth  as  possible  based  on  the  tolerated  length.  Our  approach 
is  to  associate  a  measure  to  any  curve  in  the  plane  and  to  find  the  one  that  yields 
the  highest  possible  value.  The  inertia  value  will  be  used  to  ensure  that  curves  close 
to  the  center  of  large  portions  of  the  shape  receive  high  values.  The  tolerated  length 
will  be  used  to  ensure  that  curves  bending  beyond  the  wddth  of  the  shape  receive  low 
values.  In  the  next  section  we  will  investigate  how  such  a  curve  might  be  computed 
in  a  general  framework  and  in  section  5  we  will  see  how  to  include  the  inertia  values 
and  the  tolerated  length  in  the  computation  and  what  is  the  definition  of  the  saliency 
measure  that  results. 


4  A  Network  to  Find  Salient  Curves 


In  this  section  we  will  derive  a  class  of  dynamic  programming  algorithms  that  find 
curves  in  an  arbitrary  graph  that  maximize  a  certain  quantity.  In  the  next  section  we 
will  apply  these  algorithms  to  finding  long  and  smooth  ridges  in  the  inertia  surfaces. 
[Mahoney  87]  showed  that  long  and  smooth  curves  in  binary  images  are  salient  in 
human  perception  even  if  they  have  multiple  gaps  and  in  the  presence  of  other  curves. 
[Sha’ashua  and  UUman  88]  devised  a  saliency  measure  and  a  dynamic  programming 
algorithm  that  can  find  such  salient  curves  in  a  binary  image.  We  build  on  their  work 
and  show  how  their  ideas  can  be  extended  to  deal  with  arbitrary  surfaces.  In  this 
section  we  will  examine  their  computation  in  a  way  geared  at  demonstrating  that  the 
kind  of  saliency  measures  that  can  be  computed  with  the  network  is  very  limited. 
The  actual  proof  of  this  will  be  given  in  Appendix  I. 

*  Because  of  this,  if  a  simply  connected  closed  curve  has  a  radius  of  curvature  lying  fully  inside 
the  curve  then  it  wiU  not  be  optimal.  Unfortunately  I  have  not  been  able  to  prove  that  any  simply 
connected  closed  curve  has  such  a  point  nor  that  there  is  a  curve  with  such  a  point. 


Figure  13:  Plots  of  the  inertia  surfaces  for  a  square  for  orientations  parallel  to  the 
sides  (left  two  plots)  and  parallel  to  the  diagonab  (right  two  plots). 

We  define  a  directed  graph  with  properties  G  =  (V^E^Pe^Pj)  as  a  graph  with  a 
set  of  vertices  V  =  {uj}  ;  a  set  of  edges  E  =  {e<j-  =  (wi,v,)  |  €  V};  a  function 

Pe  '•  E  -*  a  that  assigns  a  vector  p,  of  properties  to  each  edge;  and  a  function 
Pj:J—*9i  that  assigns  a  vector  pj  of  properties  to  each  junction  where  a  junction  is 
a  pair  of  adjacent  edges  (i.e.  any  pair  of  edges  that  share  a  vertex)  and  J  is  the  set  of 
all  junctions.  We  will  refer  to  a  curve  in  the  graph  as  a  sequence  of  connected  edges. 
We  assume  that  we  have  a  saliency  function  S  that  associates  a  positive  integer  5(C) 
with  each  curve  C  in  the  graph.  This  integer  is  the  saliency  or  saliency  value  of 
the  curve.  The  saliency  of  a  curve  will  be  defined  in  terms  of  the  properties  of  the 
elements  (vertices,  edges  and  junctions)  of  the  curve. 


Our  problem  is  to  find  a  computation  that  finds  for  every  point  and  each  of  its 
connecting  edges,  the  most  salient  curve  starting  at  that  point  with  that  edge.  This 
includes  defining  a  saliency  function  and  a  computation  that  will  find  the  salient 
curves  for  that  function.  The  applications  that  will  be  shown  here  work  with  a  2 
dimensional  grid.  The  vertices  are  the  points  in  the  grid  and  the  edges  the  elements 
that  connect  the  different  points  in  the  grid.  The  junctions  will  be  used  to  include 
in  the  saliency  function  properties  of  the  shape  of  the  curve  such  as  curvature. 

The  computation  will  be  performed  in  a  locally  connected  parallel  network  with 
a  processor  for  every  edge  e,-j.  The  processors  corresponding  to  the  incoming 
edges  of  a  given  vertex  will  be  connected  to  those  corresponding  to  the  connecting 
edges  at  that  vertex.  We  will  design  the  computation  so  that  we  know  at  iteration  n 
what  is  the  saliency  of  the  most  salient  curve  of  size  n  for  every  edge.  This  provides 
a  constraint  in  the  invariant  of  the  algorithm  that  we  are  seeking  that  will  guide  us 
to  the  final  algorithm.  In  order  for  the  computation  to  have  some  computing  power 
each  processor  pei^j  must  have  at  least  one  state  variable  that  we  will  denote  as  Sij. 
Since  we  want  to  know  the  saliency  of  the  most  salient  curve  of  length  n  starting  with 
any  given  edge,  we  will  assume  that,  at  iteration  n,  Si^  contains  that  value  for  that 
edge.  Observe  that  having  only  one  variable  looks  like  a  big  restriction,  however,  we 
show  in  Appendix  I  that  allowing  more  state  variables  does  not  add  any  power  to  the 
possible  saliency  functions  that  can  be  computed  with  this  network.  Since  the  saliency 
of  a  curve  is  defined  only  by  the  properties  of  the  elements  in  the  curve,  it  cannot  be 
influenced  by  properties  of  elements  outside  the  curve.  Therefore  the  computation 
to  be  performed  can  be  expressed  as: 


KjiV’  +  1)  =  MAX{.F(n  l,Pe,Pi,  «i.i(n),  Sj,k(n))  ]  (j,  k)  6  E} 


~  ^(0»Pe>P7>0»0)  (2) 

where  is  the  function  that  will  be  computed  in  every  iteration  and  that  will  lead 
to  the  computed  saliency.  Observe  that  given  .F,  the  saliency  value  of  any  curve  can 
be  found  by  applying  T  recursively  on  the  elements  of  the  curve. 

We  are  now  interested  in  what  types  of  saliency  functions  S  we  can  use  and 
what  type  of  functions  T  are  needed  to  compute  them  such  that  the  value  obtained 
in  the  computation  is  the  maximum  for  the  resulting  saliency  measure  5.  Using 
contradiction  and  induction  we  conclude  that  a  function  T  will  compute  the  most 
salient  curve  for  all  possible  graphs  if  and  only  if  it  is  monotonically  increasing  in  its 
last  argument  i.e.  iff 
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Vp,«,y  x<y 


(3) 


where  p  is  used  to  abbreviate  the  first  four  arguments  of  T. 

What  type  of  functions  T  satisfy  this  condition?  We  expect  them  to  behave  freely 
as  p  varies.  And  when  varies,  we  expect  F  to  change  in  the  same  direction  with 
an  amount  that  depends  on  p.  A  simple  way  to  fulfill  this  condition  is  with  the 
following  function: 


^(P,  *)  =  /(P)  +  M  *  ^(P)  (4) 

where  /,  g  and  h  are  positive  functions  and  g  is  monotonically  increasing. 

We  now  know  what  type  of  function  ^  we  should  use  but  we  do  not  know  what 
type  of  saliency  measures  we  can  compute.  Let  us  start  by  looking  at  the  saliency  Si 
that  we  would  compute  for  a  curve  of  length  i.  For  simplicity  we  assume  that  g  is 
the  identity  function: 


•  Iter.  1:  Si  =  /(p».,) 

•  Iter.  2:  Sj  =  Si  +  /(p,^)  *  ^(Px.,) 

•  Iter.  3:  S3  =  S2  +  /(P3.4)  *  ^(P».a)  *  ^(P».3) 

•  Iter.  4:  S^  =  S3-\-  /(P4.,)  *  ^(Px,.)  ♦  Mp».s)  *  ^(Ps^) 

•  Iter,  i:  Si  =  5<_i  + /(pi,,_,)  ♦  Ilfc^r' MPfc,*+x)  = 

E{ii/(Pu-.)*nt=i-‘MPfc.fc+x). 


At  step  n,  the  network  vrill  know  about  the  most  salient  curve  of  length  n  starting 
from  any  edge.  Recovering  the  most  salient  curve  from  a  given  point  can  be  done  by 
tracing  the  links  chosen  by  the  processors  (from  Equation  2). 
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5  Finding  Long  And  Smooth  Ridges 


In  this  section,  we  will  show  how  the  network  defined  in  the  previous  section  c&n 
be  used  to  find  frames  of  reference  using  the  inertia  surfaces  and  the  tolerated  length 
as  defined  in  Section  3.  The  directed  graph  with  properties  that  defines  the  network 
has  one  vertex  for  every  pixel  in  the  image  and  one  edge  connecting  it  to  each  of 
its  neighbors  thus  yielding  a  locally  connected  parallel  network.  This  results  in  a 
network  that  has  eight  orientations  per  pixel.  The  number  of  orientations  per  pixel 
can  be  increased  to  improve  the  accuracy  of  the  output. 

The  value  computed  is  the  sum  of  the  /(Ptj)’s  along  the  curve  weighted  by  the 
product  of  the  h(pi,j)’s.  Using  0  <  h  <  1  we  can  ensure  that  the  total  saliency  will 
be  smaller  than  the  sum  of  the  /’s.  One  way  of  achieving  this  is  by  using  h  =  1/fc  or 
h  =  exp  {—k)  and  restricting  k  to  be  larger  than  1.  The  /’s  will  then  be  a  quantity 
to  be  maximized  and  the  k’s  a  quantity  to  be  minimized  along  the  curve.  In  our 
skeleton  network  (presented  in  the  next  section),  /  will  be  the  inertia  measure  and 
k  will  depend  on  the  tolerated  length  and  will  account  for  the  shape  of  the  curve  so 
that  the  saliency  of  a  curve  is  the  sum  of  the  inertia  values  along  a  curve  weighted 
by  a  number  that  depends  on  the  overall  smoothness  of  the  curve.  In  particular,  the 
functions  /,  g  and  h  (see  Equation  4)  are  defined  as: 


•  /(p)  =  /(P.)  =  X(R,r), 

•  g{x)  =  X 

•  and  h{p)  =  h{pj)  = 


a,  which  we  call  the  circle  constant,  scales  the  tolerated  length,  and  it  was  set  to 
4  in  the  current  implementation  (because  4  radiusirf2  is  the  length  of  the  perimeter 
of  a  circle),  p,  which  we  call  the  penetration  factor,  was  set  to  0.5  (so  that  inertia 
values  “half  a  circle”  away  get  factored  down  by  0.5).  And  l„nt  is  length  of  the 
corresponding  element.  Also,  Sjj(O)  =  0  (because  the  saliency  of  a  skeleton  of  length 
0  should  be  0). 

With  this  definition  the  saliency  value  assigned  to  a  curve  of  length  L  is: 


Sl  =  Ei:f  x(  n{:'r‘  =E|:i 
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which  is  an  approximation  of  the  continuous  value  given  in  Equation  5  below.  Si  is 
the  saliency  of  a  parameterized  curve  C(ti),  and  X(u)  and  T(u)  are  the  inertia  value 
and  the  tolerated  length  respectively  at  point  u  of  the  curve. 


=  (5) 

The  obtained  measure  favors  curves  that  lie  in  large  and  central  areas  of  the  shape 
and  that  have  a  low  overall  internal  curvature.  The  measure  is  bounded  by  the  area  of 
the  shape;  e.g.  a  straight  symmetry  axis  of  a  convex  shape  will  have  a  saliency  equal 
to  the  area  of  the  shape.  In  the  next  section  we  will  present  some  results  showing 
the  robustness  of  the  scheme  in  the  presence  of  noisy  shapes. 

Observe  that  if  the  tolerated  length  T (t)  at  one  point  C{t)  is  small  then  So 

is  large  so  that  pS^  •‘Sw^dl  becomes  very  small  (since  p  <  1)  and  so  does  the  saliency 
for  the  curve  Si.  Thus,  a  small  a  or  p  penalize  curvature  favoring  smoother  curves. 


Smoothing 

Straight  lines  that  have  an  orientation  different  horn  one  of  the  eight  network 
orientations  generate  curvature  impulses  due  to  the  discretization  imposed  on  them, 
essentially  45  or  90  degrees  (in  a  number  of  pixels,  per  unit  length,  which  can  be  made 
arbitrarily  large  with  a  finer  grid).  This  results  in  a  reduction  of  the  saliency  for  such 
curves  biasing  the  network  towards  certain  orientations.  To  prevent  this,  we  made 
an  implementation  of  the  network  that  included  a  smoothing  term  that  enables  the 
processors  to  change  their  orientation  at  each  iteration,  instead  of  keeping  only  one 
of  the  eight  initial  orientations.  At  each  iteration,  the  new  orientation  is  computed 
by  looking  at  those  nearby  pixels  of  the  curve  which  lie  on  a  straight  line  (so  that 
curvature  is  minimized). 

This  allows  greater  flexibility  but  at  the  expense  of  breaking  the  optimization 
relation  shown  in  Equation  3.  A  similar  problem  is  encountered  with  the  smoothing 
term  suggested  in  [Sha’ashua  and  Ullman  1988]. 
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a)  b)  c)  d)  e) 

Figure  14:  a)  Rectangle,  b)  Skeleton  sketch  for  the  rectangle.  Circles  along  the 
contour  indicate  local  maxima  in  the  skeleton  sketch,  c)  Skeleton  sketch  for  the 
rectangle  for  one  particular  orientation,  vertical-down  in  this  case,  d)  Most  salient 
curve,  e)  Most  interesting  point  for  the  most  salient  curve. 


6  Results  and  Applications 


In  this  section  we  will  present  some  results  and  applications  of  the  frame  com¬ 
putation  and  in  the  next  section  we  will  discuss  the  connections  of  our  findings  to 
human  perception. 

The  network  described  in  the  previous  section  has  been  implemented  on  a  Con¬ 
nection  Machine  and  tried  on  a  variety  of  images.  As  mentioned  above,  the  imple¬ 
mentation  works  in  two  stages.  First,  the  distance  to  the  nearest  point  of  the  shape 
is  computed  at  different  orientations  all  over  the  image  so  that  the  intrtia  surfaces 
and  the  tolerated  length  can  be  computed,  this  requires  a  simple  distance  transform 
of  the  image.  In  the  second  stage,  the  network  described  in  section  5  computes  the 
saliency  of  the  best  curve  starting  at  each  point  in  the  image  for  different  orientations 
-  eight  in  the  current  implementation.  The  number  of  iterations  needed  is  bounded 
by  the  length  of  the  most  salient  curve  but  in  general  a  much  smaller  number  of 
iterations  w’ill  suffice.  In  all  the  examples  shown  in  this  paper  the  images  were  128 
by  128  pixels  and  128  iterations  were  used.  However,  in  most  of  the  examples,  the 
results  do  not  change  after  about  40  iterations.  In  general,  the  number  of  iterations 
is  bounded  by  the  width  of  the  shape  measured  in  pixels. 


Figure  15:  Top:  Four  shapei,  •  notched  s<{uare,  u  stump, »  J,  end  Much*t  demon¬ 
stration.  Second  row:  The  most  salient  curve  found  bj  the  network  for  each  of 
them.  Observe  that  the  scheme  is  very  stable  under  noisy  or  bent  shapes.  Third 
row:  The  most  salient  curve  starting  inside  the  shown  circles.  For  the  J  shape  the 
curve  shown  is  the  most  salient  curve  that  is  inside  the  shape.  Fourth  row:  The 
most  interesting  point  according  to  the  curves  shown  in  the  two  previous  rows. 
See  text  for  details. _ 

The  skeleton  sketch  and  the  most  salient  curve 

The  skeleton  sketch  contains  the  saliency  value  for  the  most  salient  curve  at  each 
point.  The  skeleton  sketch  is  similar  to  the  salienc;  map  described  in  [Sha’ashua  and 
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UUman  1988]  and  [Koch  and  Ullman  1985]  because  it  provides  a  saliency  measure  at 
every  point  in  the  image.  Figure  14  shows  the  skeleton  sketch  for  a  square.  The  best 
skeleton  can  be  found  by  tracing  the  curve  starting  at  the  point  having  the  highest 
skeleton  saliency  value.  Figure  15  shows  a  few  shapes  and  the  most  salient  curve 
found  by  the  network  for  each  of  them.  Observe  that  the  algorithm  is  very  robust 
in  the  presence  of  non  smooth  contours.  Given  a  region  in  the  image  we  can  find 
the  best  curve  that  starts  in  the  region  by  finding  the  maxima  of  the  skeleton  sketch 
in  the  region,  see  Figure  15.  In  general,  any  local  maximum  in  the  skeleton  sketch 
corresponds  to  a  curve  accounting  for  a  symmetry  in  the  image.  Local  maxima  in 
the  shape  itself  are  particularly  interesting  since  they  correspond  to  features  such  as 
corners. 


The  most  salient  point 

In  many  vision  tasks,  besides  being  interested  in  finding  a  salient  skeleton,  we  are 
interested  in  finding  a  particular  point  related  to  the  curve,  shape  or  image.  This  can 
be  due  to  a  variety  of  reasons,  because  it  d^nes  a  point  in  which  to  start  subsequent 
processing  to  the  curve  or  because  it  defines  a  particular  place  in  which  to  shift  our 
window  of  attention.  Different  points  can  be  defined,  the  point  with  the  highest 
saliency  value  is  one  of  them,  because  it  can  locate  relevant  features  such  as  comers. 

Another  interesting  place  in  the  image  is  the  most  central  point  in  a  curve  which 
can  be  computed  by  our  scheme  by  looking  for  the  saliency  values  along  the  curve  at 
both  directions  within  the  curve.  The  most  central  point  can  be  defined  as  the  point 
where  these  two  values  are  "large  and  equal”,  the  point  that  maximizes  min(p(,pr) 
has  been  used  in  the  current  implementation,  other  functions  are  possible,  see  Figure 
15  for  some  examples.  Observe  in  Figure  15  that  a  given  curve  can  have  several 
central  points  due  to  different  local  maxima.  This  point  can  be  used  to  direct  future 
processing’. 

Similarly,  the  most  central  point  in  the  image  can  be  defined  as  the  point  that 
maximizes  min(pj,p,)  for  all  orientations. 

’See  also  [Reisfeld,  Wdfiwn  and  Yeshnrun  1988]  where  a  scheme  to  detect  interest  points  was 
presented.  Their  scheme  was  scale  dependent  contrary  to  our  scheme  which  selects  the  larger 
structure  as  the  most  interesting  one,  independently  of  the  scale  at  which  the  scene  is  seen. 
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Figure  16:  Left:  Skeleton  Sketch  for  Mach’s  demonstration  (Original  image  in 
previous  Figure  top  right).  Center:  Skeleton  Sketch  for  one  orientation  only. 
Right:  Slice  of  the  “one-orientation”  Skeleton  Sketch  through  one  of  the  diagonals 
of  the  image.  Note  that  the  values  decrease  across  the  gaps  and  increase  inside 
the  square  (see  also  [Palmer  and  Bucher  1981]). 


Shape  description 

Each  locally  salient  curve  in  the  image  corresponds  to  a  symmetric  region  in  one 
portion  of  the  scene.  The  selection  of  the  set  of  most  interesting  frames  corresponding 
to  the  different  parts  of  the  shape  yields  a  part  description  of  the  scene.  Doing  this 
is  not  trivial  (See  [Shashua  and  Ullman  1990])  because  a  salient  curve  is  surrounded 
by  other  curves  of  similar  saliency.  In  general,  a  curve  displaced  one  pixel  to  the 
side  from  the  most  salient  curve  will  have  a  saliency  value  similar  to  that  of  the 
most  salient  one  and  higher  than  that  of  other  locally  most  salient  curves.  In  order 
to  inhibit  these  curves  we  color  out  from  a  locally  maximal  curve  in  perpendicular 
directions  to  suppress  parallel  nearby  curves.  The  amount  to  color  can  be  determined 
by  the  average  width  of  the  curve.  Once  nearby  curves  have  been  suppressed  we  look 
for  the  next  most  salient  curve  and  iterate  this  process.  Another  approach  to  find  a 
group  of  several  curves,  not  just  one,  is  given  in  [Sha’ashua  and  UUman  1990].  Both 
approaches  suffer  from  the  same  problem:  the  groups  obtained  do  not  optimize  a 
simple  global  maximization  function. 

Figure  17  shows  the  skeleton  found  for  an  airplane.  The  skeleton  can  then  be 
used  to  find  a  part  description  of  the  shape  in  which  each  component  of  the  frame 
has  different  elements  associated  describing  it:  a  set  of  contours  from  the  shape,  a 
saliency  measure  reflecting  the  relevance  or  saliency  that  the  component  has  within 
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Figure  17:  Top:  Image;  airplane  portion  enlarged;  its  edges;  airplane  without 
short  edges.  Bottom:  Vertical  inertia  surface;  skeleton  sketch;  skeleton;  most 
salient  point, 

the  shape,  a  central  point,  a  location  udthin  the  shape. 

Inside-outside 

The  network  can  also  be  used  to  determine  a  continuous  measure  of  inside-outside 
(see  also  [Subirana-Vilanova  and  Richards  1991]).  The  distance  from  a  point  to  the 
frame  can  be  used  as  a  measure  of  how  near  the  point  is  to  the  outside  of  the  shape. 
This  measure  can  be  computed  using  a  scheme  similar  to  the  one  used  to  inhibit 
nearby  curves  as  described  in  the  previous  paragraph:  coloring  out  from  the  frame  at 
perpendicular  orientations,  and  using  the  time  where  a  point  is  colored  as  a  measure 
of  how  far  from  the  frame  the  point  is.  The  saliency  of  a  curve  provides  a  measure 
of  the  area  swept  by  the  curve  which  can  be  used  to  scale  the  coloring  process. 

7  Relation  to  human  perception 

The  skeleton  found  by  the  network  for  a  given  shape  corresponds  roughly  with 
the  central  regions  of  the  shape.  In  this  section  we  show  how  the  scheme  can  handle 
various  peculiarities  of  human  perception. 


Important  frames  of  reference  in  the  perception  of  shape  and  spatial  relations  by 
humans  include:  that  of  the  perceived  object,  that  of  the  perceiver  and  that  of  the 
environment.  In  this  paper  we  have  concentrated  on  the  first.  A  considerable  amount 
of  effort  has  been  devoted  to  study  the  effects  of  orientation  of  such  a  frame  (relevant 
results  include,  to  name  but  a  few  [Attneave  1967],  [Shepard  and  Metzler  1971],  [Rock 
1973],  [Cooper  1976],  [Wiser  1980,  1981],  [Schwartz  1981],  [Shepard  and  Cooper  1982], 
[Jolicoeur  and  Landau  1984],  [Jolicoeur  1985],  [Palmer  1985],  [Palmer  and  Hurwitz 
1985],  [Corballis  and  Cullen  86],  [Maki  1986],  [Jolicoeur,  Snow  and  Murray  1987], 
[Parsons  and  Shimojo  1987],  [Robertson,  Palmer  and  Gomez  1987],  [Rock  and  DiVita 
1987],  [Bethel- Fox  and  Shepard  1988]  [Shepard  and  Metzler  1988],  [Corballis  1988], 
[Palmer,  Simone  and  Kube  1988],  [Georgopoulos,  Lurito,  Petrides,  Schwartz  and 
Massey  1989],  [Tarr  and  Pinker  1989]).  Our  scheme  suggests  a  computational  model 
of  how  such  an  orientation  may  be  computed,  i.e.  the  selected  orientation  is  that  of 
the  most  salient  skeleton  when  it  is  restricted  to  be  straight  (a  and  p  close  to  0). 

The  influence  of  the  environment  on  the  frame  has  been  extensively  studied  too 
[Mach  1914],  [Attneave  1968],  [Palmer  1980],  [Palmer  and  Bucher  1981],  [Humphreys 
1983],  [Palmer  1989].  In  some  cases  the  perception  of  the  shape  can  be  biased  by 
the  frame  of  the  environment.  In  particular,  humans  have  a  bias  for  the  vertical  in 
shape  description  (sec  [Rock  73])  so  that  some  shapes  are  perceived  very  differently 
depending  on  the  orientation  at  which  they  are  viewed,  for  example  a  rotated  square 
is  perceived  as  a  diamond  (see  Figure  23).  This  bias  can  be  taken  into  account  in  our 
scheme  by  adding  some  constant  value  to  the  inertia  surface  that  corresponds  to  the 
vertical  orientation  so  that  vertical  curves  receive  a  higher  saliency  value.  Adding  the 
bias  towards  the  vertical  is  also  useful  because  it  can  handle  non-elongated  objects 
that  are  not  symmetric,  so  that  the  preferred  frame  is  a  vertical  axis  going  through 
the  center  of  the  shape^. 

In  other  cases,  the  preferred  frame  is  defined  by  the  combination  of  several  other¬ 
wise  non  salient  frames.  This  is  the  case  in  Mach’s  demonstration,  first  described  by 
E.  Mach  at  the  beginning  of  this  century  (see  Figure  15).  Our  scheme  incorporates 
this  behavior  because  the  best  curve  can  be  extended  beyond  one  object  increasing 
the  saliency  of  one  axis  by  the  presence  of  objects  nearby,  especially  when  the  objects 
have  salient  aligned  axis.  This  example  also  illustrates  the  tolerance  of  the  scheme 
to  fragmented  shapes. 

The  shape  of  the  frame  has  received  very  little  attention.  [Subirana-Vilanova  1990] 
proposed  that  in  some  cases,  a  curved  frame  might  be  useful  (see  also  Figure  2  and 
[Palmer  1989]).  In  particular,  he  proposed  to  recognize  elongated  curved  objects  by 

*As  discussed  in  section  3,  another  alternative  is  to  define  a  specific  computation  to  handle  the 
portions  of  the  shapes  that  are  circular  [Fleck  86],  [Brady  and  Scott  88]. 
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unbending  them  using  their  main  curved  axis  as  a  frame  to  match  the  unbended 
versions.  [Subirana-Vilanova  and  Richards  1991]  have  shown  that  such  strategy  is 
not  always  used  by  the  human  visual  system. 

In  figure-ground  segregation,  reference  frame  computation  and  perceptual  orga¬ 
nization  it  is  well  known  that  humans  prefer  symmetric  regions  over  those  that  are 
not  (see  Figures  7  and  references  above^).  Symmetric  regions  can  be  discerned  in  our 
scheme  by  looking  for  the  points  in  the  image  with  higher  skeleton  saliency  values. 
However,  [Kanisza  and  Gerbino  76]  have  shown  that  in  some  cases  convexity  may 
override  symmetry  (see  Figure  7).  Convexity  information  can  be  introduced  in  the 
inertia  surfaces  by  looking  at  the  distances  to  the  shape  and  at  the  convexity  at 
these  points  so  that  frames  inside  a  convex  region  receive  a  higher  symmetry  value. 
Observe  that  the  relevant  scale  of  the  convexity  at  each  point  can  be  determined  by 
the  distances  to  the  shape  R  and  r. 

The  location  of  the  frame  of  reference  [Richards  and  Kaufman  1969],  [Kaufman 
and  Richards  1969],  [Carpenter  and  Just  1978],  [Cavanagh  1978, 1985],  [Palmer  1983], 
[Nazir  and  O’Reagan  1990]  is  related  to  attention  and  eye  movements  [Yarbus  1967] 
and  influences  figure-ground  relations  (e.g.  Figure  D9  in  [Shepard  1990]).  We  have 
shown  how  certain  salient  structures  and  individual  points  can  be  selected  in  the 
image  using  the  Skeleton  Sketch.  Subsequent  processing  stages  can  be  applied  selec¬ 
tively  to  the  selected  structures,  endowing  the  system  with  a  capacity  similar  to  the 
use  of  selective  attention  in  human  vision.  The  points  provided  by  the  Saliency  Sketch 
are  in  locations  central  to  some  structures  of  the  image  and  could  guide  processing 
in  a  way  similar  to  the  direction  of  gaze  in  humans  (e.g.  [Yarbus  1967]). 

[Palmer  1983]  studied  the  influence  of  symmetry  on  figural  goodness.  He  com¬ 
puted  a  “mean  goodness  rating”  associated  to  each  point  inside  a  figure.  For  a  square 
(see  Figure  4  in  [Palmer  1983]),  he  found  a  distribution  similar  to  that  of  the  skele¬ 
ton  sketch  shown  in  Figure  14.  The  role  of  this  measure  is  unclear  but  our  scheme 
suggests  that  it  can  be  computed  bottom-up  and  hence  play  a  role  in  the  recognition 
of  the  shape. 

Perhaps,  this  measure  is  involved  in  providing  translation  invariance  so  that  ob¬ 
jects  are  first  transformed  into  a  canonical  position.  This  suggestion  is  similar  to 
others  that  attempt  to  explain  rotation  invariance  (see  references  above)  and  it  could 
be  tested  in  a  similar  way.  For  example,  one  can  compute  the  time  to  learn/ recognize 
an  object  (from  a  class  sharing  a  similar  property  such  as  the  one  shown  in  Figure  20) 
in  terms  of  a  given  displacement  in  fixation  point  (or  orientation  in  the  references 

*The  role  of  symmetry  has  been  studied  also  for  random  dot  displays  [Barlow  and  Reeves  1979], 
[Barlow  1982]  and  occlusion  [Rock  84]. 
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above). 


Figure  18:  Top  Center:  Figure  is  often  seen  a^  s'  o«7n  on  the  rights  (and  ground 
as  on  the  left)  due  to  vertical  bias.  Bottom  Center:  Preference  for  the  vertical, 
and  preference  for  large  objects  Is  over-ridden  here  by  the  preference  for  small 
structures  (after  [Rock  1985]).  The  network  presented  in  this  paper  would  find 
the  left  object  as  figure  due  to  its  preference  for  large  structures.  Further  research 
is  necessary  to  clarify  when  small  structures  are  more  salient. 


Figure  19:  Like  in  the  previous  Figure,  small  structures  define  the  object  de¬ 
picted  in  this  image.  Drawing  from  Mir6.  This  image  would  confuse  the  network 
presented  in  [Sha’ashua  and  Ullman  1988). 


8  What’s  New 


In  this  paper  we  have  presented  C.I.F.  (Curved  Inertia  Frames),  a  novel  scheme  to 
compute  curved  symmetry  axes.  Previous  schemes  either  use  global  information,  but 
compute  only  straight  axes,  or  compute  curved  axes  and  use  only  local  information. 


24 


Figure  20:  Does  the  time  to  recognize/leam  this  objects  depend  on  the  fixation 
point?  (See  text  for  details.) 
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Figure  21:  Left:  Text  image.  Center:  Output  of  the  convolution  of  the  text  image 
with  an  elongated  horizontal  gabor  filter.  Right:  Most  salient  curves. 


The  scheme  presented  in  this  paper  can  extract  curved  symmetry  axes  and  use  global 
information.  This  gives  the  scheme  some  clear  advantages  over  previous  ones,  such 
as:  1)  it  can  compute  curved  axes,  2)  it  provides  connected  axes,  3)  it  is  remarkably 
stable  to  changes  in  the  shape,  4)  it  provides  a  measure  associated  with  the  relevance 
of  the  axes  in  the  shape,  which  can  be  used  for  shape  description  and  for  grouping 
based  on  symmetry  and  convexity  5)  it  can  tolerate  noisy  and  spurious  data  6)  it 
provides  central  points  of  the  shape. 

We  have  suggested  a  novel  scheme  to  recognize  elongated  flexible  objects  by  ‘*un' 
bending”  them  using  C.I.F.  and  demonstrated  the  ‘‘unbending”  transformation  on 
the  simple  shapes  shown  in  Figure  2.  This  is  useful  because  flexible  objects  can  be 
matched  as  rigid  ones  once  they  have  been  transformed  to  the  canonical  straight 
orientation.  In  fact,  the  canonical  orientation  need  not  be  straight.  If  the  objects 
generally  deviate  from  a  circular  arc  then  the  canonical  representation  could  store 
the  object  with  a  circular  principal  axis. 


We  believe  that  the  Skeleton  Sketch  and  its  associated  curves  and  interest  points 
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Figure  22:  This  Figure  provides  evidence  that  a  salient  blob  in  one  image  might 
not  be  so  when  other  elements  are  introduced.  By  fixating  at  the  X  try  to  identify 
the  letter  N  in  the  left  and  in  the  right  of  the  image.  The  one  on  the  left  is  not 
identifiable.  We  contend  that  this  is  due  to  the  fact  that  the  human  visual  system 
selects  the  larger  scale  in  the  left  case  yielding  an  horizontal  blob.  This  example 
is  due  to  J.  Lettvin  (reference  taken  from  [Ullman  84]). 


can  also  be  used  for  part  segmentation,  attention,  figure-ground  segmentation,  per¬ 
ceptual  organization,  recognition  and  feature  detection.  However,  further  research  is 
necessary  to  support  this. 

The  Skeleton  Sketch  suggests  a  way  in  which  interest  points  can  be  computed 
bottom-up,  and  hence  that  they  might  be  useful  as  anchor  structures  for  aligning 
model  to  object.  It  also  provides  a  continuous  measure  that  can  be  used  to  determine 
the  distance  from  the  center  of  the  object,  suggesting  a  number  of  experiments.  For 
example,  one  could  test  whether  the  time  to  learn/ recognize  an  object  depends  on 
the  fixation  point  in  a  similar  way  in  which  a  dependence  has  been  found  in  human 
perception  between  object  orientation  and  recognition  time/accuracy  (see  references 
above).  This  could  be  done  on  a  set  of  similar  objects  of  the  type  shown  in  Figure  20. 

We  have  introduced  the  inertia  surfaces  and  the  tolerated  length  and  we  have 
shown  how  they  can  be  used  to  find  skeletons  using  a  sophisticated  version  of  an 
algorithm  presented  previously  [Sha’ashua  and  Ullman  88].  In  the  Appendix  we 
show  some  limitations  on  the  functions  that  can  be  optimized  using  such  algorithm. 
Similar  measures  might  be  used  to  find  skeletons  using  other  algorithms  such  as  those 
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Figure  23:  A  square  has  four  symmetry  axis  all  of  which  could  potentially  be 
used  to  describe  it.  Depending  which  one  of  them  is  chosen  this  shape  appears 
as  a  square  or  as  a  diamond.  This  suggests  that  when  there  is  ambiguity  the 
vertical  can  play  an  important  role.  The  two  trapezoids,  on  the  right  further 
illustrate  that  even  when  a  shape  has  several  symmetry  axis  the  vertical  might 
be  preferred  even  if  it  does  not  correspond  to  a  perfect  symmetry  axis.  Observe 
that  the  vertical  might  be  overridden  by  an  exterior  frame  which  can  be  defined 
by  the  combination  of  several  otherwise  not  salient  frames  from  different  shapes 
such  as  Mach  demonstration,  see  Figure  8. 


presented  in  [Kass,  Witkin  and  Terzopoulos  88]  and  [Zucker,  Dobbins  and  Iverson 
89). 


The  network  presented  in  this  paper  computes  skeletons  in  2  dimensional  images. 
The  network  can  be  extended  to  finding  3  dimensional  skeletons  from  3  dimensional 
data  since  the  local  estimates  for  orientation  and  curvature  can  be  found  in  a  similar 
way  and  the  network  extends  to  3  dimensions  •  this,  of  course,  at  the  cost  of  increasing 
the  number  of  processors.  The  problem  of  finding  3D  skeletons  from  2D  images  is 
more  complex;  however,  in  most  cases  the  projection  of  the  3D  skeleton  can  be  found 
by  working  on  the  2D  projection  of  the  shape,  especially  for  elongated  objects  (see 
shapes  in  [Snodgrass  and  Vanderwart  1980]). 

The  scheme  presented  in  this  paper  has  two  important  limitations.  First,  it  relies 
on  discontinuities.  This  can  be  overcome,  to  a  certain  extent,  by  extending  the  scheme 
to  finding  high,  long  and  smooth  curves  in  arbitrary  surfaces  (but  see  [Subirana- 
Vilanova  and  Sung  1992]).  The  scheme,  as  presented  here,  searches  for  the  best 
curve  using  local  estimates  for  orientation  and  curvature.  The  estimates  can  be 
obtained  in  an  arbitrary  surface  by  convolving  it  with  oriented  gabor  filters  at  different 
orientations  and  scales.  This  could  be  applied  to  many  tasks  in  vision.  An  example 
of  such  applications  is  finding  dark  blobs  in  images  (see  figure  21),  the  scheme  selects 
both  a  region  and  a  scale  in  the  image. 

The  second  limitation  is  that  it  has  a  bias  for  large  structures.  This  is  generally 
a  good  rule,  even  for  human  perception,  except  in  some  cases,  see  Figures  19,  18 
and  22.  The  example  of  Figure  18  provides  evidence  that  the  preference  for  small 
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objects  can  not  be  only  due  to  pop-out  effects.  A  naive  solution  would  be  to  rank 
the  scale  of  the  different  contours  in  the  image  and  find  the  salient  ones  in  terms  of 
their  location  in  such  ordering.  This  distinction  had  not  been  made  clear  before  and 
deserves  further  treatment. 
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Appendix  I 


In  the  appendix  we  show  that  the  set  of  possible  saliency  measures  that  can  be 
computed  with  the  network  defined  in  [Sha’ashua  and  UUman  88]  (see  also  section 
4)  is  limited. 


Proposition  1  The  use  of  more  than  one  state  variable  in  the  saliency  network 
defined  in  section  4  does  not  increase  the  set  of  possible  saliency  functions  that  can 
be  computed  with  the  network. 


Proof:  The  notation  used  in  the  proof  will  be  the  one  used  in  section  4.  We  will  do 
the  proof  for  the  case  of  two  state  variables,  the  generalization  of  the  proof  to  more 
state  variables  follows  naturally.  Each  edge  will  have  a  saliency  state  variable  Sij 
and  an  auxiliary  state  variable  and  two  functions  to  update  the  state  variables: 

+  1)  =  MAXk:F{p,Si,kin),aj^{n))  and  Oij(n  +  1)  =  ^(p,  «>,*(«),  o,,*(n)).  We 
will  show  that  for  any  pair  of  functions  T  and  Q  either  they  can  be  reduced  to  one 
function  or  there  is  a  network  for  which  they  do  not  compute  the  optimal  curves. 

If  T  does  not  depend  on  its  last  argument  Uj^k  then  the  decision  of  what  is  the 
most  salient  curve  is  not  affected  by  the  introduction  of  more  state  variables  so  we 
can  do  without  them.  Observe  that  we  might  still  use  the  state  variables  to  compute 
additional  properties  of  the  most  salient  curve  without  affecting  the  actual  shape  of 
the  computed  curve. 
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If  T  does  depend  on  its  last  argument  then  there  exists  some  p,  x,y  and  w  €  ^ 
such  that:  T{p,y,x)  <  ^{p,y,w).  Assuming  continuity  this  implies  that  there 
exists  some  e  >  0  such  that;  ^(p,  y  —  e  ,x)  <  ^{p,y,w).  Assume  now  two  curves 
of  length  n  starting  from  the  same  edge  c<,j  such  that  3lj,j(n)  =  y,  aljj(n)  =  x, 
s2ij{n)  =  y  —  c  and  a2ij(n)  =  y.  the  algorithm  where  correct  at  iteration  n  it 
would  have  computed  the  values  slij(n)  =  y,  alij(n)  =  x  for  the  variables  Si^  and 
Ojj.  But  then  at  iteration  n+1  the  saliency  value  computed  for  an  edge  c/^j  would  be 
=  ^(p,  y  —  e  ,z)  instead  of  .^(p, y,u;)  that  corresponds  to  a  curve  with  a  higher 
saliency  value.  □. 
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